Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove clustalw, set fixed raxml memory #72

Open
wants to merge 2 commits into
base: main
Choose a base branch
from
Open

Conversation

natefoo
Copy link
Member

@natefoo natefoo commented Oct 15, 2024

I am not sure what is going on here but:

  • AU allocates from 11.5 to 30.7 GB and multiple cores, although I don't believe the wrapper supports multiple cores?
  • In the shared DB (and on .org) we allocate 34 GB.

In practice on .org the histogram looks like:

ndc@galaxy-db% gxadmin tsvquery tool-metrics 'toolshed.g2.bx.psu.edu/repos/devteam/clustalw/clustalw/2.1+galaxy1' memory.peak --ok | awk '{print $1 / 1024 / 1024 / 1024}' | gxadmin filter histogram
(   0.221,    0.466) n=6472  **************************************************
[   0.466,    0.711) n=131   *
[   0.711,    0.956) n=86
[   0.956,    1.201) n=47
[   1.201,    1.446) n=24
[   1.446,    1.691) n=30
[   1.691,    1.936) n=22
[   1.936,    2.181) n=23
[   2.181,    2.426) n=8
[   2.426,    2.671) n=8
[   2.671,    2.916) n=4
[   2.916,    3.161) n=4
[   3.161,    3.406) n=1
[   3.406,    3.651) n=6
[   3.651,    3.896) n=2
[   3.896,    4.141) n=4
[   4.141,    4.386) n=1
[   4.386,    4.631) n=0
[   4.631,    4.876) n=0
[   4.876,    5.121) n=2
[   5.121,    5.367) n=0
[   5.367,    5.612) n=1
[   5.612,    5.857) n=1
[   5.857,    6.102) n=1
[   6.102,    6.347) n=2
[   6.347,    6.592) n=1
[   6.592,    6.837) n=0
[   6.837,    7.082) n=0
[   7.082,    7.327) n=1
[   7.327,    7.572) n=0
[   7.572,    7.817) n=1
[   7.817,    8.062) n=0
[   8.062,    8.307) n=0
[   8.307,    8.552) n=0
[   8.552,    8.797) n=0
[   8.797,    9.042) n=0
[   9.042,    9.287) n=0
[   9.287,    9.532) n=0
[   9.532,    9.777) n=0
[   9.777,   10.022) n=1

Maybe old versions were much more inefficient? But it does not seem like we need to allocate anything special for the current version. I would be interested to see what memory usage looks like at EU and AU.

Also with larger inputs the tool runs forever, AU rejects anything >= 40 MB, I will update if I can determine what the rough size at which it runs forever for us is, and maybe it is worth including that in the shared DB?

raxml wants 16 cores but because of the memory factor that most people use in their default tool (see #73) is probably going to also request ~64 GB of memory despite only using ~2 GB:

ndc@galaxy-db% gxadmin tsvquery tool-metrics 'toolshed.g2.bx.psu.edu/repos/iuc/raxml/raxml/8.2%' memory.peak --ok --like | awk '{print $1 / 1024 / 1024 / 1024}' | gxadmin filter histogram
(   0.226,    2.336) n=554   **************************************************
[   2.336,    4.446) n=8
[   4.446,    6.556) n=5
[   6.556,    8.666) n=5
[   8.666,   10.776) n=0
[  10.776,   12.887) n=1
[  12.887,   14.997) n=2
[  14.997,   17.107) n=0
[  17.107,   19.217) n=1
[  19.217,   21.327) n=0
[  21.327,   23.437) n=2
[  23.437,   25.548) n=1
[  25.548,   27.658) n=0
[  27.658,   29.768) n=0
[  29.768,   31.878) n=0
[  31.878,   33.988) n=1
[  33.988,   36.098) n=0
[  36.098,   38.208) n=1

@natefoo
Copy link
Member Author

natefoo commented Oct 15, 2024

I have very few "ok" clustalw jobs with inputs over 60 MB.

@natefoo natefoo changed the title Remove clustalw Remove clustalw, set fixed raxml memory Oct 15, 2024
@bgruening
Copy link
Member

galaxy@sn06:~$ gxadmin tsvquery tool-metrics 'toolshed.g2.bx.psu.edu/repos/devteam/clustalw/clustalw/2.1+galaxy1' memory.peak --ok | awk '{print $1 / 1024 / 1024 / 1024}' | gxadmin filter histogram


(   0.215,    0.716) n=2712  **************************************************
[   0.716,    1.217) n=4     
[   1.217,    1.718) n=15    
[   1.718,    2.219) n=4     
[   2.219,    2.721) n=2     
[   2.721,    3.222) n=1     
[   3.222,    3.723) n=1     
[   3.723,    4.224) n=0     
[   4.224,    4.725) n=0     
[   4.725,    5.227) n=0     
[   5.227,    5.728) n=0     
[   5.728,    6.229) n=0     
[   6.229,    6.730) n=0     
[   6.730,    7.231) n=0     
[   7.231,    7.733) n=0     
[   7.733,    8.234) n=0     
[   8.234,    8.735) n=0     
[   8.735,    9.236) n=0     
[   9.236,    9.737) n=0     
[   9.737,   10.239) n=0     
[  10.239,   10.740) n=0     
[  10.740,   11.241) n=0     
[  11.241,   11.742) n=0     
[  11.742,   12.243) n=0     
[  12.243,   12.745) n=0     
[  12.745,   13.246) n=0     
[  13.246,   13.747) n=1     
[  13.747,   14.248) n=1     
[  14.248,   14.750) n=0     

@bgruening
Copy link
Member

galaxy@sn06:~$ gxadmin tsvquery tool-metrics 'toolshed.g2.bx.psu.edu/repos/iuc/raxml/raxml/8.2%' memory.peak --ok --like | awk '{print $1 / 1024 / 1024 / 1024}' | gxadmin filter histogram
(   0.219,    0.558) n=57    **************************************************
[   0.558,    0.897) n=1     
[   0.897,    1.236) n=2     *
[   1.236,    1.575) n=1     
[   1.575,    1.914) n=3     **
[   1.914,    2.253) n=0     
[   2.253,    2.592) n=0     
[   2.592,    2.931) n=1     
[   2.931,    3.270) n=1     
[   3.270,    3.609) n=1    

@cat-bro
Copy link
Collaborator

cat-bro commented Oct 15, 2024

On AU, for clustalw:

$ gxadmin tsvquery tool-metrics 'toolshed.g2.bx.psu.edu/repos/devteam/clustalw/clustalw/2.1+galaxy1' memory.max_usage_in_bytes --ok | awk '{print $1 / 1024 / 1024 / 1024}' | gxadmin filter histogram
(   0.018,    1.083) n=1451  **************************************************
[   1.083,    2.147) n=14
[   2.147,    3.212) n=4
[   3.212,    4.277) n=2
[   4.277,    5.341) n=0
[   5.341,    6.406) n=0
[   6.406,    7.471) n=0
[   7.471,    8.535) n=0
[   8.535,    9.600) n=0
[   9.600,   10.665) n=0
[  10.665,   11.729) n=0
[  11.729,   12.794) n=0
[  12.794,   13.859) n=0
[  13.859,   14.923) n=1
[  14.923,   15.988) n=1
[  15.988,   17.053) n=0
[  17.053,   18.117) n=0
[  18.117,   19.182) n=0
[  19.182,   20.246) n=0
[  20.246,   21.311) n=0
[  21.311,   22.376) n=0
[  22.376,   23.440) n=0
[  23.440,   24.505) n=0
[  24.505,   25.570) n=1

@cat-bro
Copy link
Collaborator

cat-bro commented Oct 15, 2024

For latest raxml:

cat@galaxy:~$ gxadmin tsvquery tool-metrics 'toolshed.g2.bx.psu.edu/repos/iuc/raxml/raxml/8.2.12+galaxy1' memory.max_usage_in_bytes --ok | awk '{print $1 / 1024 / 1024 / 1024}' | gxadmin filter histogram
(   0.030,    8.196) n=336   **************************************************
[   8.196,   16.361) n=0
[  16.361,   24.527) n=1
[  24.527,   32.692) n=0
[  32.692,   40.857) n=1
[  40.857,   49.023) n=1
[  49.023,   57.188) n=1
[  57.188,   65.354) n=0
[  65.354,   73.519) n=0
[  73.519,   81.684) n=0
[  81.684,   89.850) n=0
[  89.850,   98.015) n=0
[  98.015,  106.181) n=0
[ 106.181,  114.346) n=0
[ 114.346,  122.511) n=1

This gxadmin query/filter combo is extremely neat.
For some reason like isn't working for me 'toolshed.g2.bx.psu.edu/repos/iuc/raxml/raxml/8.2%' like 'toolshed.g2.bx.psu.edu/repos/iuc/raxml/raxml/8.2.12+galaxy1' evaluates to false. If I use the ~ instead of like it evaluates to true.

Thanks for this @natefoo! clustalw settings here probably came from AU and may not have changed since dtd days.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants